Overview

Dataset statistics

Number of variables21
Number of observations2117035
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.8 GiB
Average record size in memory912.8 B

Variable types

Numeric8
DateTime1
Categorical12

Alerts

store_name has a high cardinality: 1560 distinct values High cardinality
deposit/distributor has a high cardinality: 73 distinct values High cardinality
municipality/city has a high cardinality: 310 distinct values High cardinality
sku is highly correlated with unit_price and 1 other fieldsHigh correlation
units is highly correlated with total and 1 other fieldsHigh correlation
unit_price is highly correlated with sku and 1 other fieldsHigh correlation
unit_margin is highly correlated with sku and 1 other fieldsHigh correlation
total is highly correlated with units and 1 other fieldsHigh correlation
total_margin is highly correlated with units and 1 other fieldsHigh correlation
sku is highly correlated with unit_price and 1 other fieldsHigh correlation
units is highly correlated with total and 1 other fieldsHigh correlation
unit_price is highly correlated with sku and 1 other fieldsHigh correlation
unit_margin is highly correlated with sku and 1 other fieldsHigh correlation
total is highly correlated with units and 1 other fieldsHigh correlation
total_margin is highly correlated with units and 1 other fieldsHigh correlation
units is highly correlated with total and 1 other fieldsHigh correlation
unit_price is highly correlated with unit_marginHigh correlation
unit_margin is highly correlated with unit_priceHigh correlation
total is highly correlated with units and 1 other fieldsHigh correlation
total_margin is highly correlated with units and 1 other fieldsHigh correlation
division is highly correlated with grade and 3 other fieldsHigh correlation
sku_subcategory is highly correlated with sku_description and 1 other fieldsHigh correlation
format is highly correlated with supplierHigh correlation
grade is highly correlated with division and 3 other fieldsHigh correlation
supplier is highly correlated with formatHigh correlation
sku_description is highly correlated with sku_subcategory and 1 other fieldsHigh correlation
state is highly correlated with division and 3 other fieldsHigh correlation
structure is highly correlated with division and 3 other fieldsHigh correlation
sku_category is highly correlated with sku_subcategory and 1 other fieldsHigh correlation
deposit/distributor is highly correlated with division and 3 other fieldsHigh correlation
store_id is highly correlated with format and 2 other fieldsHigh correlation
format is highly correlated with store_id and 2 other fieldsHigh correlation
structure is highly correlated with division and 3 other fieldsHigh correlation
division is highly correlated with structure and 3 other fieldsHigh correlation
deposit/distributor is highly correlated with store_id and 5 other fieldsHigh correlation
state is highly correlated with structure and 3 other fieldsHigh correlation
supplier is highly correlated with store_id and 1 other fieldsHigh correlation
grade is highly correlated with structure and 3 other fieldsHigh correlation
sku is highly correlated with sku_description and 4 other fieldsHigh correlation
sku_description is highly correlated with sku and 5 other fieldsHigh correlation
sku_category is highly correlated with sku and 5 other fieldsHigh correlation
sku_subcategory is highly correlated with sku and 5 other fieldsHigh correlation
units is highly correlated with sku_description and 5 other fieldsHigh correlation
unit_price is highly correlated with sku and 4 other fieldsHigh correlation
unit_margin is highly correlated with sku and 5 other fieldsHigh correlation
total is highly correlated with units and 1 other fieldsHigh correlation
total_margin is highly correlated with units and 1 other fieldsHigh correlation
id has unique values Unique

Reproduction

Analysis started2021-10-17 16:00:58.987346
Analysis finished2021-10-17 16:03:46.338073
Duration2 minutes and 47.35 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

id
Real number (ℝ≥0)

UNIQUE

Distinct2117035
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1249979.587
Minimum0
Maximum2501729
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size24.2 MiB
2021-10-17T18:03:46.418077image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile123203.7
Q1622100.5
median1250273
Q31878664.5
95-th percentile2377093.3
Maximum2501729
Range2501729
Interquartile range (IQR)1256564

Descriptive statistics

Standard deviation723773.447
Coefficient of variation (CV)0.5790282134
Kurtosis-1.205380782
Mean1249979.587
Median Absolute Deviation (MAD)628288
Skewness0.000768806663
Sum2.646250535 × 1012
Variance5.238480026 × 1011
MonotonicityStrictly increasing
2021-10-17T18:03:46.560075image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20471
 
< 0.1%
21439321
 
< 0.1%
22012401
 
< 0.1%
22032891
 
< 0.1%
21971461
 
< 0.1%
21991951
 
< 0.1%
22094361
 
< 0.1%
22053421
 
< 0.1%
21213771
 
< 0.1%
21152341
 
< 0.1%
Other values (2117025)2117025
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
41
< 0.1%
51
< 0.1%
81
< 0.1%
91
< 0.1%
101
< 0.1%
111
< 0.1%
131
< 0.1%
ValueCountFrequency (%)
25017291
< 0.1%
25017281
< 0.1%
25017261
< 0.1%
25017251
< 0.1%
25017241
< 0.1%
25017231
< 0.1%
25017221
< 0.1%
25017201
< 0.1%
25017191
< 0.1%
25017181
< 0.1%

date
Date

Distinct446
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size32.3 MiB
Minimum2016-01-02 00:00:00
Maximum2017-12-05 00:00:00
2021-10-17T18:03:46.716078image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:46.843074image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

store_id
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1591
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2653.210929
Minimum2
Maximum5855
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size32.3 MiB
2021-10-17T18:03:47.493083image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile166
Q11640
median2734
Q33785
95-th percentile5443
Maximum5855
Range5853
Interquartile range (IQR)2145

Descriptive statistics

Standard deviation1375.886199
Coefficient of variation (CV)0.518573998
Kurtosis-0.4108450181
Mean2653.210929
Median Absolute Deviation (MAD)1060
Skewness0.07630074236
Sum5616940400
Variance1893062.833
MonotonicityNot monotonic
2021-10-17T18:03:47.622073image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2404410
 
0.2%
38094256
 
0.2%
10443964
 
0.2%
38463949
 
0.2%
38423939
 
0.2%
23823925
 
0.2%
38123919
 
0.2%
38133820
 
0.2%
58553772
 
0.2%
38213769
 
0.2%
Other values (1581)2077312
98.1%
ValueCountFrequency (%)
2930
< 0.1%
4581
< 0.1%
5544
< 0.1%
6657
< 0.1%
7454
< 0.1%
8316
 
< 0.1%
9892
< 0.1%
10276
 
< 0.1%
11366
 
< 0.1%
12307
 
< 0.1%
ValueCountFrequency (%)
58553772
0.2%
58511133
 
0.1%
58501714
0.1%
5844899
 
< 0.1%
58431482
 
0.1%
58271666
0.1%
58252324
0.1%
58173211
0.2%
58152408
0.1%
58133276
0.2%

format
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size147.0 MiB
Grocery
655581 
AllInOne
622612 
Leftorium
404589 
GipsyTrade
223873 
Shop
161863 
Other values (5)
 
48517

Length

Max length11
Median length8
Mean length7.8059777
Min length4

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGrocery
2nd rowLeftorium
3rd rowLeftorium
4th rowGrocery
5th rowLeftorium

Common Values

ValueCountFrequency (%)
Grocery655581
31.0%
AllInOne622612
29.4%
Leftorium404589
19.1%
GipsyTrade223873
 
10.6%
Shop161863
 
7.6%
GreatShop37222
 
1.8%
TinyShop7812
 
0.4%
SmallShop3145
 
0.1%
Center287
 
< 0.1%
SuperMarket51
 
< 0.1%

Length

2021-10-17T18:03:47.768074image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-17T18:03:47.894076image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
grocery655581
31.0%
allinone622612
29.4%
leftorium404589
19.1%
gipsytrade223873
 
10.6%
shop161863
 
7.6%
greatshop37222
 
1.8%
tinyshop7812
 
0.4%
smallshop3145
 
0.1%
center287
 
< 0.1%
supermarket51
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

store_name
Categorical

HIGH CARDINALITY

Distinct1560
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size173.9 MiB
Last Golden Convenience
 
5043
Bloody Chocolate AllInOne
 
4787
Short Olive BigCom
 
4730
Dull Navy BookShop
 
4410
Empty Grey HairDresser
 
4256
Other values (1555)
2093809 

Length

Max length32
Median length21
Mean length21.14351062
Min length10

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rowBizarre Salmon Market
2nd rowYoung Khaki TallMarket
3rd rowBig Salmon Deli
4th rowBad Crimson Shop
5th rowKicking Lemon LunchHall

Common Values

ValueCountFrequency (%)
Last Golden Convenience5043
 
0.2%
Bloody Chocolate AllInOne4787
 
0.2%
Short Olive BigCom4730
 
0.2%
Dull Navy BookShop4410
 
0.2%
Empty Grey HairDresser4256
 
0.2%
Awesome Yellow Stationer4151
 
0.2%
Kicking Brown Convenience4063
 
0.2%
Dead Lemon Tobacconist3964
 
0.2%
Bizarre Salmon BookShop3949
 
0.2%
Angry Green Laundrette3939
 
0.2%
Other values (1550)2073743
98.0%

Length

2021-10-17T18:03:48.087075image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
green122729
 
1.9%
blue110863
 
1.7%
shop81388
 
1.3%
stationer79612
 
1.2%
fishmonger77711
 
1.2%
florist76485
 
1.2%
young76140
 
1.2%
deli75790
 
1.2%
mall74590
 
1.2%
tallmarket74585
 
1.2%
Other values (102)5563907
86.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

structure
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size144.9 MiB
METRO
1135001 
FORANEO
726595 
DISTRIBUIDORES
255439 

Length

Max length14
Median length5
Mean length6.772356621
Min length5

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFORANEO
2nd rowMETRO
3rd rowMETRO
4th rowMETRO
5th rowFORANEO

Common Values

ValueCountFrequency (%)
METRO1135001
53.6%
FORANEO726595
34.3%
DISTRIBUIDORES255439
 
12.1%

Length

2021-10-17T18:03:48.202084image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-17T18:03:48.275084image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
metro1135001
53.6%
foraneo726595
34.3%
distribuidores255439
 
12.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

division
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size157.8 MiB
METRO-NORTE
597048 
METRO-SUR
537953 
CENTRO
233276 
BAJÍO
220487 
CENTRO-SUR
128802 
Other values (6)
399469 

Length

Max length11
Median length9
Mean length8.709758696
Min length5

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBAJÍO
2nd rowMETRO-NORTE
3rd rowMETRO-SUR
4th rowMETRO-NORTE
5th rowBAJÍO

Common Values

ValueCountFrequency (%)
METRO-NORTE597048
28.2%
METRO-SUR537953
25.4%
CENTRO233276
 
11.0%
BAJÍO220487
 
10.4%
CENTRO-SUR128802
 
6.1%
OCCIDENTE117924
 
5.6%
SURESTE94048
 
4.4%
PENÍNSULA72708
 
3.4%
NORESTE52688
 
2.5%
SUROESTE34513
 
1.6%

Length

2021-10-17T18:03:48.384084image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
metro-norte597048
28.2%
metro-sur537953
25.4%
centro233276
 
11.0%
bajío220487
 
10.4%
centro-sur128802
 
6.1%
occidente117924
 
5.6%
sureste94048
 
4.4%
península72708
 
3.4%
noreste52688
 
2.5%
suroeste34513
 
1.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

deposit/distributor
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION

Distinct73
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size168.1 MiB
Emolor Vocals
285854 
Lorthogus Toughworks
154174 
Thrilthoal Empire
153812 
Thrilvitar Industrail
 
124420
Ioalfio Ifonforge Industries
 
120507
Other values (68)
1278268 

Length

Max length32
Median length17
Mean length18.26221012
Min length12

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowArgule Ifonforge Industries
2nd rowEmolor Vocals
3rd rowThrilthoal Empire
4th rowLorthogus Toughworks
5th rowArgule Ifonforge Industries

Common Values

ValueCountFrequency (%)
Emolor Vocals285854
 
13.5%
Lorthogus Toughworks154174
 
7.3%
Thrilthoal Empire153812
 
7.3%
Thrilvitar Industrail124420
 
5.9%
Ioalfio Ifonforge Industries120507
 
5.7%
Thrilvallor Corp.103629
 
4.9%
Argule Ifonforge Industries84057
 
4.0%
Rusrusmo Toughworks80754
 
3.8%
Erodmad Industrail72370
 
3.4%
Elortho Aerospace71482
 
3.4%
Other values (63)865976
40.9%

Length

2021-10-17T18:03:48.523088image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
vocals502748
 
11.0%
industries320214
 
7.0%
ifonforge320214
 
7.0%
emolor285854
 
6.3%
toughworks285280
 
6.3%
industrail270849
 
5.9%
empire235293
 
5.2%
corp212078
 
4.7%
lorthogus154174
 
3.4%
thrilthoal153812
 
3.4%
Other values (72)1813768
39.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

state
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct29
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size147.9 MiB
Alloralf
691507 
Magornmar
554462 
Lorvalmo
92626 
Thrilvallor
90364 
Erodmo
77317 
Other values (24)
610759 

Length

Max length11
Median length8
Mean length8.266879386
Min length5

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLorvalmo
2nd rowMagornmar
3rd rowMagornmar
4th rowAlloralf
5th rowLorvalmo

Common Values

ValueCountFrequency (%)
Alloralf691507
32.7%
Magornmar554462
26.2%
Lorvalmo92626
 
4.4%
Thrilvallor90364
 
4.3%
Erodmo77317
 
3.7%
Egusgul72771
 
3.4%
Coalmar61526
 
2.9%
Lorgulgus56432
 
2.7%
Nabargus54921
 
2.6%
Alfrusma54086
 
2.6%
Other values (19)311023
14.7%

Length

2021-10-17T18:03:48.692085image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
alloralf691507
32.7%
magornmar554462
26.2%
lorvalmo92626
 
4.4%
thrilvallor90364
 
4.3%
erodmo77317
 
3.7%
egusgul72771
 
3.4%
coalmar61526
 
2.9%
lorgulgus56432
 
2.7%
nabargus54921
 
2.6%
alfrusma54086
 
2.6%
Other values (19)311023
14.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

municipality/city
Categorical

HIGH CARDINALITY

Distinct310
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size170.3 MiB
Enlightened Dark Empire
513610 
Evil Chocolate Rock
 
79646
First Salmon Rock
 
67596
Purple Creek
 
66267
Long Lavender Butter
 
55443
Other values (305)
1334473 

Length

Max length28
Median length20
Mean length19.32658411
Min length5

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowFirst Salmon Rock
2nd rowEnlightened Dark Empire
3rd rowEnlightened Dark Empire
4th rowLong Lavender Butter
5th rowFirst Salmon Rock

Common Values

ValueCountFrequency (%)
Enlightened Dark Empire513610
24.3%
Evil Chocolate Rock79646
 
3.8%
First Salmon Rock67596
 
3.2%
Purple Creek66267
 
3.1%
Long Lavender Butter55443
 
2.6%
Last Pink River54959
 
2.6%
Horrible Chartreuse Mud51515
 
2.4%
Rare Lemon Key44927
 
2.1%
Clean Aqua Soil40750
 
1.9%
Angry Navy Empire40639
 
1.9%
Other values (300)1101683
52.0%

Length

2021-10-17T18:03:48.830084image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
empire596058
 
9.5%
dark534725
 
8.5%
enlightened528657
 
8.4%
rock193255
 
3.1%
evil171744
 
2.7%
chocolate170371
 
2.7%
purple109814
 
1.7%
tree105747
 
1.7%
first99264
 
1.6%
long96855
 
1.5%
Other values (107)3680619
58.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

supplier
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size153.4 MiB
proveedor_2
1917950 
proveedor_1
199085 

Length

Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowproveedor_2
2nd rowproveedor_2
3rd rowproveedor_2
4th rowproveedor_2
5th rowproveedor_2

Common Values

ValueCountFrequency (%)
proveedor_21917950
90.6%
proveedor_1199085
 
9.4%

Length

2021-10-17T18:03:48.989086image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-17T18:03:49.117086image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
proveedor_21917950
90.6%
proveedor_1199085
 
9.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

grade
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size134.9 MiB
IV
1037575 
V
554462 
III
214383 
VI
186672 
II
119681 

Length

Max length3
Median length2
Mean length1.837347517
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIV
2nd rowV
3rd rowV
4th rowIV
5th rowIV

Common Values

ValueCountFrequency (%)
IV1037575
49.0%
V554462
26.2%
III214383
 
10.1%
VI186672
 
8.8%
II119681
 
5.7%
I4262
 
0.2%

Length

2021-10-17T18:03:49.233083image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-17T18:03:49.355087image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
iv1037575
49.0%
v554462
26.2%
iii214383
 
10.1%
vi186672
 
8.8%
ii119681
 
5.7%
i4262
 
0.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

sku
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct34
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4525981.968
Minimum4220015
Maximum4829827
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size32.3 MiB
2021-10-17T18:03:49.491088image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum4220015
5-th percentile4230022
Q14351043
median4533931
Q34739414
95-th percentile4809797
Maximum4829827
Range609812
Interquartile range (IQR)388371

Descriptive statistics

Standard deviation204325.5328
Coefficient of variation (CV)0.04514501699
Kurtosis-1.399950746
Mean4525981.968
Median Absolute Deviation (MAD)182888
Skewness-0.03917794757
Sum9.581662236 × 1012
Variance4.174892337 × 1010
MonotonicityNot monotonic
2021-10-17T18:03:49.633087image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=34)
ValueCountFrequency (%)
4769520139132
 
6.6%
4230022129366
 
6.1%
4351043127350
 
6.0%
4779537119365
 
5.6%
478954499068
 
4.7%
433080097645
 
4.6%
453393197260
 
4.6%
424003997197
 
4.6%
425004693645
 
4.4%
455414392176
 
4.4%
Other values (24)1024831
48.4%
ValueCountFrequency (%)
422001557553
2.7%
4230022129366
6.1%
424003997197
4.6%
425004693645
4.4%
42600538939
 
0.4%
433080097645
4.6%
4351043127350
6.0%
436105079798
3.8%
438142590842
4.3%
442218719356
 
0.9%
ValueCountFrequency (%)
482982727367
 
1.3%
481981053972
 
2.5%
480979784737
4.0%
478954499068
4.7%
4779537119365
5.6%
4769520139132
6.6%
47494212987
 
0.1%
47394142752
 
0.1%
472919316665
 
0.8%
47191793264
 
0.2%

sku_description
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct34
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size178.6 MiB
Cornflakes Chocolate Choc Chip
 
139132
Crunch Life Chocolate
 
129366
Graham Mango Peach
 
127350
Crunchy Bran Chocolate Choc Chip
 
119365
Crunchy Nut Cornflakes Chocolate Choc Chip
 
99068
Other values (29)
1502754 

Length

Max length42
Median length22
Mean length23.46612786
Min length15

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMuesli Chocolate
2nd rowCups Mango Peach
3rd rowFruit & Nut Chocolate Choc Chip
4th rowGraham Mango Peach
5th rowCrunch Berries Mango Peach

Common Values

ValueCountFrequency (%)
Cornflakes Chocolate Choc Chip139132
 
6.6%
Crunch Life Chocolate129366
 
6.1%
Graham Mango Peach127350
 
6.0%
Crunchy Bran Chocolate Choc Chip119365
 
5.6%
Crunchy Nut Cornflakes Chocolate Choc Chip99068
 
4.7%
Crunch Mango Peach97645
 
4.6%
Cups Mango Peach97260
 
4.6%
Lucjy Charms Chocolate97197
 
4.6%
Muesli Chocolate93645
 
4.4%
Stars Mango Peach92176
 
4.4%
Other values (24)1024831
48.4%

Length

2021-10-17T18:03:49.811089image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
chocolate1136530
14.6%
mango980505
12.6%
peach980505
12.6%
choc749830
9.7%
chip749830
9.7%
crunch318919
 
4.1%
crunchy295491
 
3.8%
bran281160
 
3.6%
cornflakes252655
 
3.3%
lucjy153795
 
2.0%
Other values (28)1865012
24.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

sku_category
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size153.7 MiB
Sugar-free
1329011 
Super-flavour
788024 

Length

Max length13
Median length10
Mean length11.11669009
Min length10

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSugar-free
2nd rowSuper-flavour
3rd rowSugar-free
4th rowSugar-free
5th rowSugar-free

Common Values

ValueCountFrequency (%)
Sugar-free1329011
62.8%
Super-flavour788024
37.2%

Length

2021-10-17T18:03:50.017087image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-17T18:03:50.160084image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
sugar-free1329011
62.8%
super-flavour788024
37.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

sku_subcategory
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size152.7 MiB
Cereal bars
1329011 
Cereal box
788024 

Length

Max length11
Median length11
Mean length10.62776997
Min length10

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCereal bars
2nd rowCereal box
3rd rowCereal bars
4th rowCereal bars
5th rowCereal bars

Common Values

ValueCountFrequency (%)
Cereal bars1329011
62.8%
Cereal box788024
37.2%

Length

2021-10-17T18:03:50.302086image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-17T18:03:50.410088image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
cereal2117035
50.0%
bars1329011
31.4%
box788024
 
18.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

units
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct822
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.81549543
Minimum0.2
Maximum172
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size32.3 MiB
2021-10-17T18:03:50.525093image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0.2
5-th percentile2.7
Q110
median20
Q345.36
95-th percentile120
Maximum172
Range171.8
Interquartile range (IQR)35.36

Descriptive statistics

Standard deviation35.03071821
Coefficient of variation (CV)1.067505389
Kurtosis2.584247665
Mean32.81549543
Median Absolute Deviation (MAD)13.25
Skewness1.736329198
Sum69471552.37
Variance1227.151218
MonotonicityNot monotonic
2021-10-17T18:03:50.988522image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10.8171931
 
8.1%
12125606
 
5.9%
24121115
 
5.7%
4.8112482
 
5.3%
6.75106746
 
5.0%
6085394
 
4.0%
21.670503
 
3.3%
2070028
 
3.3%
3669455
 
3.3%
4869374
 
3.3%
Other values (812)1114401
52.6%
ValueCountFrequency (%)
0.2658
 
< 0.1%
0.2520
 
< 0.1%
0.42414
0.1%
0.45556
 
< 0.1%
0.56
 
< 0.1%
0.61720
0.1%
0.756
 
< 0.1%
0.82024
0.1%
0.92617
0.1%
12016
0.1%
ValueCountFrequency (%)
17221
 
< 0.1%
17115
 
< 0.1%
170.11051
 
< 0.1%
17012
 
< 0.1%
16914
 
< 0.1%
168.757
 
< 0.1%
168.2129
 
< 0.1%
16810294
0.5%
16748
 
< 0.1%
166.32143
 
< 0.1%

unit_price
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct231
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.37807589
Minimum10.97
Maximum47.8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size32.3 MiB
2021-10-17T18:03:51.203521image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum10.97
5-th percentile12.67
Q113.74
median15.08
Q336.23
95-th percentile42.14
Maximum47.8
Range36.83
Interquartile range (IQR)22.49

Descriptive statistics

Standard deviation11.79541157
Coefficient of variation (CV)0.5045501444
Kurtosis-1.514331558
Mean23.37807589
Median Absolute Deviation (MAD)2.94
Skewness0.5524296635
Sum49492204.89
Variance139.131734
MonotonicityNot monotonic
2021-10-17T18:03:51.344535image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18.3770410
 
3.3%
19.659446
 
2.8%
36.2348205
 
2.3%
13.3947212
 
2.2%
13.1246290
 
2.2%
12.8643829
 
2.1%
35.5440200
 
1.9%
36.0639858
 
1.9%
13.8439587
 
1.9%
13.5537771
 
1.8%
Other values (221)1644227
77.7%
ValueCountFrequency (%)
10.9714084
0.7%
11.0920298
1.0%
11.194956
 
0.2%
11.316629
 
0.3%
11.410846
0.5%
11.541965
 
0.1%
11.613
 
< 0.1%
11.625166
 
0.2%
11.796
 
< 0.1%
11.828295
0.4%
ValueCountFrequency (%)
47.81796
0.1%
47.35630
 
< 0.1%
46.96312
 
< 0.1%
46.92
 
< 0.1%
46.841924
0.1%
46.7323
 
< 0.1%
46.41
 
< 0.1%
46.36750
 
< 0.1%
45.471
 
< 0.1%
45.242487
0.1%

unit_margin
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct366
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16.96744954
Minimum5.4
Maximum39.15
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size32.3 MiB
2021-10-17T18:03:51.519558image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum5.4
5-th percentile7.09
Q19.67
median10.62
Q327.28
95-th percentile32.78
Maximum39.15
Range33.75
Interquartile range (IQR)17.61

Descriptive statistics

Standard deviation9.370304831
Coefficient of variation (CV)0.5522518167
Kurtosis-1.333476509
Mean16.96744954
Median Absolute Deviation (MAD)1.86
Skewness0.6072504708
Sum35920684.53
Variance87.80261263
MonotonicityNot monotonic
2021-10-17T18:03:51.663557image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12.6448447
 
2.3%
9.4939587
 
1.9%
10.1735512
 
1.7%
13.6434380
 
1.6%
9.5632877
 
1.6%
9.8432501
 
1.5%
9.6730277
 
1.4%
27.1829576
 
1.4%
1027717
 
1.3%
9.2926761
 
1.3%
Other values (356)1779400
84.1%
ValueCountFrequency (%)
5.414084
0.7%
5.5120298
1.0%
5.555609
 
0.3%
5.624956
 
0.2%
5.736629
 
0.3%
5.7513
 
< 0.1%
5.782565
 
0.1%
5.825237
 
0.2%
5.946
 
< 0.1%
5.96846
 
< 0.1%
ValueCountFrequency (%)
39.15312
 
< 0.1%
39.041924
0.1%
38.55750
 
< 0.1%
38.221796
0.1%
37.77630
 
< 0.1%
37.661
 
< 0.1%
37.322
 
< 0.1%
37.12323
 
< 0.1%
36.821
 
< 0.1%
36.8675
 
< 0.1%

total
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct10633
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean595.3688936
Minimum4.505
Maximum3083.13
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size32.3 MiB
2021-10-17T18:03:51.819563image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum4.505
5-th percentile87.624
Q1190.416
median394.956
Q3814.104
95-th percentile1826.88
Maximum3083.13
Range3078.625
Interquartile range (IQR)623.688

Descriptive statistics

Standard deviation556.1532946
Coefficient of variation (CV)0.9341322675
Kurtosis2.090083149
Mean595.3688936
Median Absolute Deviation (MAD)228.876
Skewness1.576944355
Sum1260416786
Variance309306.4871
MonotonicityNot monotonic
2021-10-17T18:03:51.942557image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
123.997529716
 
1.4%
132.328268
 
1.3%
173.90420806
 
1.0%
389.44816004
 
0.8%
386.31615852
 
0.7%
247.99515548
 
0.7%
264.614144
 
0.7%
407.05213650
 
0.6%
178.65613435
 
0.6%
267.812900
 
0.6%
Other values (10623)1936712
91.5%
ValueCountFrequency (%)
4.5051
 
< 0.1%
4.59258
 
< 0.1%
4.732
 
< 0.1%
4.95
 
< 0.1%
7.10858
< 0.1%
7.24624
 
< 0.1%
7.27753
 
< 0.1%
7.3028
 
< 0.1%
7.44414
 
< 0.1%
7.548113
< 0.1%
ValueCountFrequency (%)
3083.132
 
< 0.1%
3082.90054
< 0.1%
3078.3871
 
< 0.1%
30787
< 0.1%
3075.2461
 
< 0.1%
3073.441
 
< 0.1%
3073.4283
< 0.1%
3072.3041
 
< 0.1%
3072.0064
< 0.1%
3071.7091
 
< 0.1%

total_margin
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct14531
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean423.6205214
Minimum3.0725
Maximum2529.792
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size32.3 MiB
2021-10-17T18:03:52.093557image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum3.0725
5-th percentile63.576
Q1133.65
median289.575
Q3583.8
95-th percentile1301.52
Maximum2529.792
Range2526.7195
Interquartile range (IQR)450.15

Descriptive statistics

Standard deviation402.4650316
Coefficient of variation (CV)0.950060281
Kurtosis2.826570142
Mean423.6205214
Median Absolute Deviation (MAD)173.535
Skewness1.699498889
Sum896819470.6
Variance161978.1017
MonotonicityNot monotonic
2021-10-17T18:03:52.237557image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
85.3219095
 
0.9%
92.0716462
 
0.8%
311.5813650
 
0.6%
130.94413435
 
0.6%
113.0412360
 
0.6%
170.6410853
 
0.5%
265.0329821
 
0.5%
82.95759788
 
0.5%
113.889111
 
0.4%
227.768666
 
0.4%
Other values (14521)1993794
94.2%
ValueCountFrequency (%)
3.07251
 
< 0.1%
3.11251
 
< 0.1%
3.121
 
< 0.1%
3.166
 
< 0.1%
3.24752
 
< 0.1%
3.414
 
< 0.1%
3.42751
 
< 0.1%
4.57238
< 0.1%
4.7111
 
< 0.1%
4.953
 
< 0.1%
ValueCountFrequency (%)
2529.7923
 
< 0.1%
2525.0422
 
< 0.1%
2524.327
 
< 0.1%
2517.1211
 
< 0.1%
2498.042
 
< 0.1%
2496.31216
 
< 0.1%
2492.7662
 
< 0.1%
2485.6567
 
< 0.1%
2478.168253
< 0.1%
2472.87622
 
< 0.1%

Interactions

2021-10-17T18:03:24.838015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:35.279020image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:41.622022image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:48.392020image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:55.511016image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:02.237015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:09.246016image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:16.598015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:25.702015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:36.079017image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:42.383021image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:49.299015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:56.400016image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:03.159016image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:10.104015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:17.684016image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:26.570015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:36.875015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:43.218032image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:50.247015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:57.187015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:04.077015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:10.974015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:18.706016image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:27.435015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:37.629015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:44.051015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:51.197015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:57.971015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:04.970015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:11.827015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:19.734017image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:28.286015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:38.383016image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:44.908016image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:52.071018image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:58.786015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:05.817015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:12.670015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:20.781015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:29.156016image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:39.158016image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:45.748015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:53.026015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:59.640020image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:06.677015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:13.515015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:21.808015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:30.015015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:40.066016image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:46.579015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:53.889015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:00.502015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:07.553015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:14.589015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:22.879016image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:30.830015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:40.869017image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:47.372017image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:02:54.716015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:01.368015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:08.403015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:15.594016image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-10-17T18:03:23.945015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Correlations

2021-10-17T18:03:52.382212image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-10-17T18:03:52.768317image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-10-17T18:03:53.246333image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-10-17T18:03:53.652344image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2021-10-17T18:03:54.037029image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-10-17T18:03:33.175015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
A simple visualization of nullity by column.
2021-10-17T18:03:36.877019image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

iddatestore_idformatstore_namestructuredivisiondeposit/distributorstatemunicipality/citysuppliergradeskusku_descriptionsku_categorysku_subcategoryunitsunit_priceunit_margintotaltotal_margin
002016-01-021292GroceryBizarre Salmon MarketFORANEOBAJÍOArgule Ifonforge IndustriesLorvalmoFirst Salmon Rockproveedor_2IV4250046Muesli ChocolateSugar-freeCereal bars12.0013.889.59166.5600115.0800
112016-01-022466LeftoriumYoung Khaki TallMarketMETROMETRO-NORTEEmolor VocalsMagornmarEnlightened Dark Empireproveedor_2V4533931Cups Mango PeachSuper-flavourCereal box43.2036.0624.541557.79201060.1280
222016-01-021460LeftoriumBig Salmon DeliMETROMETRO-SURThrilthoal EmpireMagornmarEnlightened Dark Empireproveedor_2V4819810Fruit & Nut Chocolate Choc ChipSugar-freeCereal bars27.0019.6013.64529.2000368.2800
342016-01-023543GroceryBad Crimson ShopMETROMETRO-NORTELorthogus ToughworksAlloralfLong Lavender Butterproveedor_2IV4351043Graham Mango PeachSugar-freeCereal bars17.0113.189.67224.1918164.4867
452016-01-023586LeftoriumKicking Lemon LunchHallFORANEOBAJÍOArgule Ifonforge IndustriesLorvalmoFirst Salmon Rockproveedor_2IV4422187Crunch Berries Mango PeachSugar-freeCereal bars13.5018.3712.48247.9950168.4800
582016-01-022951AllInOneAlive Azure CenterFORANEOSURESTEEioval AerospaceCoalmarRare Orange Applepieproveedor_2VI4554143Stars Mango PeachSugar-freeCereal bars12.0014.199.84170.2800118.0800
692016-01-022666AllInOneAwesome Silver CafeFORANEOCENTRO-SURErodmad IndustrailEgusgulDead Orange Shipproveedor_2IV4533931Cups Mango PeachSuper-flavourCereal box21.6036.0624.54778.8960530.0640
7102016-01-022219LeftoriumFunny Pink JewellerMETROMETRO-NORTEIoarco Corp.AlloralfTiny Chocolate Riverproveedor_2IV4789544Crunchy Nut Cornflakes Chocolate Choc ChipSuper-flavourCereal box43.2035.7728.771545.26401242.8640
8112016-01-023798AllInOneEnlightened Green FishmongerMETROMETRO-NORTEEmolor VocalsAlloralfLast Pink Riverproveedor_2IV4665676Crunch Chocolate Choc ChipSuper-flavourCereal box32.4037.6928.851221.1560934.7400
9132016-01-023799AllInOneEnlightened Olive FloristMETROMETRO-SURThrilvitar IndustrailMagornmarEnlightened Dark Empireproveedor_2V4665676Crunch Chocolate Choc ChipSuper-flavourCereal box54.0037.6928.852035.26001557.9000

Last rows

iddatestore_idformatstore_namestructuredivisiondeposit/distributorstatemunicipality/citysuppliergradeskusku_descriptionsku_categorysku_subcategoryunitsunit_priceunit_margintotaltotal_margin
211702525017182017-12-052591AllInOneCyan BigComFORANEOSURESTEMalormad Corp.ThogulgulAwesome Blue Cityproveedor_2II4809797Fruit & Bran Chocolate Choc ChipSugar-freeCereal bars6.7519.2913.36130.207590.18
211702625017192017-12-051008AllInOneDangerous Pink StationerDISTRIBUIDORESCENTROMovitalf EmpireAlloralfLong Chocolate Barrierproveedor_2IV4250046Muesli ChocolateSugar-freeCereal bars48.0015.1710.66728.1600511.68
211702725017202017-12-051653GroceryTiny Silver GreatShopDISTRIBUIDORESBAJÍORusloralf Inc.ErodmoAngry Fuchsia Bayproveedor_2III4769520Cornflakes Chocolate Choc ChipSuper-flavourCereal box1.6042.5832.5568.128052.08
211702825017222017-12-052159GroceryLegendary Magenta ButcherDISTRIBUIDORESCENTROMovitalf EmpireAlloralfLong Chocolate Barrierproveedor_2IV4230022Crunch Life ChocolateSugar-freeCereal bars10.0013.889.81138.800098.10
211702925017232017-12-05240ShopDull Navy BookShopMETROMETRO-SURThrilthoal EmpireAlloralfRare Lemon Keyproveedor_1IV4543948Puffs Mango PeachSuper-flavourCereal box5.4043.2531.90233.5500172.26
211703025017242017-12-051672GroceryAwful Red ConvenienceMETROMETRO-SURIoalfio Ifonforge IndustriesMagornmarEnlightened Dark Empireproveedor_2V4769520Cornflakes Chocolate Choc ChipSuper-flavourCereal box14.4042.5832.55613.1520468.72
211703125017252017-12-051075GipsyTradeKicking Navy FloristFORANEOCENTROElortho AerospaceAlloralfAwful Beigeproveedor_2IV4789544Crunchy Nut Cornflakes Chocolate Choc ChipSuper-flavourCereal box13.5042.7535.06577.1250473.31
211703225017262017-12-053995GroceryEmpty Sky Blue GipsyTradeFORANEOCENTRO-SURGorngusco ToughworksLorgulgusHorrible Purple Treeproveedor_2IV4442378Crunchy Bran Mango PeachSugar-freeCereal bars156.0011.825.981843.9200932.88
211703325017282017-12-052708GroceryLong Chartreuse StationerMETROMETRO-SURThrilthoal EmpireMagornmarEnlightened Dark Empireproveedor_2V4330800Crunch Mango PeachSugar-freeCereal bars24.0014.8810.02357.1200240.48
211703425017292017-12-051779GroceryFunny Sea Green AllInOneMETROMETRO-SURBarmaralf Inc.AlloralfNew Aqua Jungleproveedor_2IV4230022Crunch Life ChocolateSugar-freeCereal bars16.0013.889.81222.0800156.96